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We propose a quantum algorithm for closest pattern matching which allows us to search for as 
many distinct patterns as we wish in a given string (database), requiring a query function per 
symbol of the pattern alphabet. This represents a significant practical advantage when compared to 
Grover's search algorithm as well as to other quantum pattern matching methods Q, which rely on 
building specific queries for particular patterns. Our method makes arbitrary searches on long static 
databases much more realistic and implementable. Our algorithm, inspired by Grover's, returns the 
position of the closest substring to a given pattern of size AI with non-negligible probability in 
0{V7^) queries, where A'^ is the size of the string. Furthermore, we give the full recipe to implement 
our algorithm (together with its total circuit complexity), thus offering an oracle-based quantum 
algorithm ready to be implemented. 

PACS numbers: 03.67.-a, 03.67.Lx, 03.67.Mn 



Search in databases is nowadays a common and funda- 
mental application in computer science, one that we use 
daily to find a word in a text, or a site in Google. Cur- 
rently we are also living the quantum information rev- 
olution, where the idea to encode information in quan- 
tum systems offers us a radically new type of informa- 
tion which allows for much securer communications and 
much faster computations than what we were able to 
achieve so far using (now-called) classical information 
0, Q. In particular, quantum cryptography has in re- 
cent years quickly progressed from a "beautiful idea" jj] 
to a plug-and-play application that one can purchase. 
There have also been a few quantum algorithms proposed 
(the most significant probably being Shor's efficient fac- 
torization algorithm |^ of 1994, solving a problem 
that classically is believed to be intractable) even though 
the construction of a scalable quantum computer is still 
a challenge, presently being tackled with a plethora of 
different technologies 0- Yet, should quantum compu- 
tation become a reality, there is still no implementable 
efficient quantum algorithm to search a given database 
[l^ . despite Grover's celebrated quantum search algo- 
rithm proposed in 1996 Grover's work, which now 
constitutes a paradigm for quantum search algorithms, 
offers a quadratic speed-up in query complexity (i.e. calls 
of a query function) when compared to the classical case. 
However, in the real execution of these search algorithms, 
we must distinguish the compile time and the run time. 
The compile time is essentially the construction of the 
query function on which the algorithm relies to identify 
the element being searched. But this construction is, in 
general, not a negligible task. In particular, for database 
search, we must go through all the database elements to 
build the so-called oracle, so that we can then implement 
the search. Note that this makes the quantum search ir- 



relevant in practical terms, since you need to know the 
solution to run it. Moreover, given a query function built 
to find a particular element, it can only be used again to 
find that very same element. The search for a differ- 
ent item in the same database would require building a 
new specific query function. All this represents a serious 
obstacle to the application of current search algorithms 

To address this problem, we propose a quantum al- 
gorithm for pattern matching which allows us to search 
for as many distinct patterns as we wish in a given un- 
sorted string (database), and moreover returns the po- 
sition of the closest substring to a given pattern with 
non-negligible probability in 0{\/N) queries, where N is 
the size of the string. This means that the time to find 
the closest match (a much harder problem than to find 
the exact match, as we shall see) does not depend on 
the size of the pattern itself, a result with no classical 
equivalent. Another crucial point is that our quantum 
algorithm is actually useful and implementable to per- 
form searches in (unsorted) databases. For this, we intro- 
duce a query function per symbol of the pattern alphabet, 
which will require a significant (though clearly efficient) 
pre-processing, but will allow us to perform an arbitrary 
amount of different searches in a static database. A com- 
pile once, run many approach yielding a new search al- 
gorithm that not only settles the previously existing im- 
plementation problems, but even offers the solution of a 
more general problem, and with a very interesting speed- 
up. After exposing in detail our algorithm and present- 
ing the respective analysis in the most significant limit 
(when the pattern is much smaller than the text and not 
frequent) , we give the explicit recipe for the construction 
of the query functions and our non-trivial initial state, 
including their circuit complexity analysis. But let us 



2 



start by briefly reviewing what is know classically about 
the pattern matching problem. 

In the classical setting, the best known algorithm 
for the closest substring problem takes 0{MN) queries 
where M is the size of the pattern. This result follows 
from adapting the best known algorithm for approximate 
pattern matching 0, which takes 0(eiV + M) where e is 
the number of allowed errors, and take e = (M — 1), that 
is, the closest match could be a substring that coincides 
just one letter with the pattern. One should not compare 
the closest match to (exact) pattern match, where the 
problem consists in determining if a certain word (pat- 
tern) is a substring of a text. For exact pattern matching 
it is proven that the best algorithm can achieve 0{M+N) 
0. However, in practical cases where data can mutate 
over time, like DNA, or is store in a faulty systems, the 
closest match problem is a much more relevant, since 
sometimes, only approximates of the pattern exist, but 
nevertheless need to be found. 

Our algorithm is based on the modified Grover search 
algorithm proposed in jlH] for the case of multiple so- 
lutions. It uses the techniques originally introduced by 
Grover |^: a query operator that marks the state en- 
coding the database element being searched by changing 
its phase; followed by an amplitude amplification of the 
marked state. The state can be detected with non negligi- 
ble probability by iterating this process VN times where 
N is the size of the database. 

Let us now describe our closest pattern matching algo- 
rithm. Given a string w of size N over an alphabet S, we 
want to know if a certain pattern p of size M occurs in 
w, or at least obtain the closest match to p in w. In par- 
ticular we want to find the position i e {1, . . . , N} where 
a certain symbol of p occurs in w. To this end, we encode 
position i in a unit vector \i) of a Hilbert space Ti of di- 
mension N (where the set B = {|1), . . . , |A^)} constitutes 
an orthonormal basis of Ti.). Since we are considering 
patterns of size M the total search space will be Ti®^'^ . 

The initial state of the total system reflects the fact 
that we want the second symbol of p to occur just after 
the flrst, and the third to occur just after the second, and 
so on. For this reason we consider the following initial en- 
tangled state, which consists of a uniform superposition 
of all possible states fulfiUing this property: 



2) as follows: 



r{\i)®\b)) = \T)®\fa(l)®b), 



(2) 



where \i) encodes position i and \b) is a auxiliary qubit 
and /o- is a function such that: 



1 if the i-ih letter of w is ct 
otherwise 



(3) 



As in Grover's algorithm, we want to use the query to 
mark states where there is a match for the individual 
symbol, in particular by shifting the phase of the respec- 
tive state, as given by the following unitary transforma- 
tion: 



U,\k) = (-l)/"W|fc), 



(4) 



where |fc) e B. 

However, in our quantum pattern matching algorithm 
a query operator will be applied for a random symbol 
of the pattern to the corresponding position. Hence, 
on average, a position with a partial match, say of M' 
out of M matches of individual symbols, will have the 
query operator applied ^ times. Note that the more 
matches we obtain, the more phase shift will be shifted, 
and consequently the more the amplitude will be am- 
plified. Observe that for a given string there might be 
full and partial matches, leading to larger and smaller 
amplitude amplifications respectively (see Fig. ^ for an 
example) . 
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FIG. 1: Simulation of our algorithm for a random string of size 
= 212 and a particular pattern of size M = 10 occurring 
towards the end of the string. 
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|fc,/c + l,...,A: + M-l), 



(1) 

thus restricting Ti***^ to a subspace of dimension N — 
M -|- 1. I'i/'o) can easily be adapted to patterns with gaps. 

To perform the search, we now need to define a query 
operator for each symbol a of the alphabet E. We 
will thus have |E| different query operators. Each acts 
over Ti (g) 7^2 (where 7^2 is the Hilbert space of dimension 



Note that, if N>>M, which is usually the case, sam- 
pling randomly over M elements \/N times will lead to 
searching, with very highly probably, over all elements of 
the pattern, that is, as N grows, this probability tends 
to 1 exponentially fast. 

The amplitude amplification is obtained by applying 
the usual Grover diffusion D = Dn (Xi J"**^-! to the total 
state, where: 



DM = mv){'p\)-i), 



(5) 
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/ is the identity operator of dimension N and \(p) G Ti, is 
given by the uniform superposition = X]i=i "^1*)- 

The algorithm is then constituted by iterating the 
phase shift induced by the query followed by amplitude 
amplification. The final step is to measure the state of a 
symbol of the pattern over the predefined basis B, yield- 
ing the position of the closest match of the pattern in the 
string. We show that it is enough to iterate ^/N in order 
to observe with non-ncgligiblc probability a match of the 
pattern. 

In summary, the algorithm will be as follows: 

Input: w eT.* and p £ T.* 
Output: m e N 

Quantum variables: \^p) G ni{l, . . . , iV})®^ 
Classical variables: r, i, j e N 

1. choose r e [0, \-y/N — M + Ij] uniformly, 

2. set IV.) = E"=?'+' 7W=mi\''^ k + l,...,k + M- 

3. for i = 1 to r 

(a) choose j e [1, Af] uniformly 

(b) set IV') = ® Qp^ ® I^^^-JIV-); 

(c) set 1^) = 

4. set m to the result of the measurement of the first 
component of over the base {|1), . . . , |A^)}. 

The analysis of the algorithm follows closely that pro- 
posed in [lOj. The proof for the case where only exact 
matches exist, and the symbols in the pattern occur only 
in these matches, can be adapted straightforwardly, and 
will state that the probability of finding a solution using 
the algorithm above is at least j. When symbols occur 
elsewhere we are in the context where the closest match, 
which we analyze next. 

Assume that the alphabet is rich enough, and the sym- 
bols of the pattern do not occur very often, if this is not 
the case one can combine letters in pairs or triples. More- 
over, if N is very large and N >> M, then the average 
amplitude is around Note that, in this case, each 

step of Grover amplification amplifies an amplitude a by 
since first it inverts the amplitude to —a and then 
applies the diffusion D or inversion around average oper- 
ator that gives + a. Now, if the pattern occurs in a 
position p then the random choice of j at step 3 (a), will 
always lead to an amplitude amplification of p. If there 
is a partial match, say M' symbols out of M, then, in 
average, the amplification will be done ^ times. 

We are now able to state the amplitude amplification 
for a match of M' out o M is in average ^ times less 
than the amplification for a perfect match, since -|- 

is added to the amplitude only ^ times. 



Assuming an oracle for computing Q™. for all pj in the 
pattern, the query complexity of our pattern matching 
quantum algorithm is 0{^/N), with no dependence on 
M, apart from the cost of setting up of the initial state 
But to transform this interesting theoretical result 
into a useful application, we now proceed to describe in 
detail how to build the query functions and how to gener- 
ate our non-trivial initial state, thus giving the full recipe 
to implement our algorithm together with its total circuit 
complexity. 

The quantum circuit for the query operator is obtained 
from implementing a permutation operator. As already 
noticed in for any Boolean function of n bits / : 
{0, 1}" {0, 1} we are able to construct a bijection / 
on n + 1 bits such that: 

/(O, Xi , . . . , Xn) — (/(^l 1 ■ ■ • 1 ^n) ; ^1 ; ■ ■ ■ ; ) 

y (1 , Xi , . . . , (1 Xyj) , Xi , . . . , Xn^ 

In the corresponding quantum case we have a Hilbert 
' space Ti. oi n + 1 qubits and / induces a unitary trans- 
formation U where: 

U\xo,Xi, ...,Xn) = \ f{xo, II, ■ ■ ■ , Xn)). (7) 

Note that U is the quantum implementation of the orig- 
inal Boolean function / we wish to calculate. The value 
of the function is stored in the first qubit and the rest 
are ignored. Moreover, note that U is simply a permuta- 
tion over the computational basis of 7i, and therefore can 
be obtained by composing 2"+^ — 1 transpositions (i.e. 
permutations of only two elements keeping the remain- 
ing unchanged), that is, U = U1U2 ■ ■ ■ Uk with k < 2"+^ 
where Ui acts only in two elements of the basis. 

Finally using Gray codes, we are able to implement 
each Ui using at most 0{n^) C-NOT and Pauli-X gates. 
In detail, recall that given two distinct binary words £ 
and i' of the same size s, a Gray code from £ to £' is 
a sequence of binary words ro, . . . , such that ro — £, 
Tfc = £' and rj^i differs only in one bit from rj for any 
j S {1, . . . , k}. Note that k is less than or equal to s, the 
size of the binary words. Given a Gray code we are able 
to build the circuit to shift £ to £', by applying in sequence 
a controlled swap operation to the bit distinguishing Tj-i 
from rj for all j € {1, . . . ,k}. Albeit the obtained per- 
mutation maps \£) to \£'), it is not true in general that 
1^') is mapped to £. Indeed, \£') is mapped to \rk-i), and 
\rj) is mapped to \rj-i) for any j G {1, . . . , k}. In or- 
der to obtain a transposition, we need to map \rk-i) to 
\£) = \ro) and \rj) to {r^+i) for all j € 0, . . . , k - 2. This 
can be achieved again by considering the Gray code from 
rfc_i to ro = £, which is precisely r^^ , rk-2, ■ • ■ , fo- Again, 
by applying in sequence controlled swap operation to the 
bit distinguishing rj from r^-i for all j S {A: — 1, ... , 1} 
we attain the desired transposition. Observe that for the 
particular case of implementing the transposition Ui on 
7i, the size s of the words is n + 1. 
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In summary, given a transposition Ui that transposes 
1^) with \£') and let ro,...rk be the Gray code from £ 
to i' , the algorithm implementing Ui can be obtained as 
follows: 

Input: \ijjo) eH 
Output: Ui\tpo) 
Classical variable: i E N 
Quantum variable: {ip) G H 

1. swap 1-0) with ji/jo) 

2. for i = 1 to fc 

(a) set = C-NDT(r,_i, r^lV-); 

3. for i = fc — 1 to 1 

(a) set IV) - C-NOT(r,_i,r,)|V'); 

where the non-trivial gate C-NOT(ri_i, r^) is the transpo- 
sition such that: 

C-NOT(r„r,_i)|r,) = |r,_i> 

C-NOT(r,,r,_i)|r,_i) = |r,) (8) 
C-NOT(ri,r,_i)|w) = |w) for aU w ^ |r,_i), \ri). 

In one can find a canonical construction of such con- 
trolled gates requiring 0{n). 

We conclude that any Boolean function of n bits can 
be implemented using 0(71^2") C-NOT gates and 0(2") 
Pauli-X gates. This means that a query quantum circuit 
for inspecting a list of TV elements can be built using 
0{N log^ N) =d{N) gates. 

The overall circuit complexity of our quantum pattern 
matching algorithm — excluding the initial setup — is 
0(iV3/2log2(7V)log(M)), versus 0{MN^) for a classical 
circuit (note that one needs 0{N) classical Boolean gates 
to produce a circuit that reads an arbitrary database of 
size A^). 

It remains to explain the cost of setting up the initial 
state IV'o)) given by equation (Q, assuming that initially 
all qubits are set to |0). Since we need M variables rang- 
ing from 1 to N, we will require M log(A^) qubits to en- 
code the quantum state of the program. We assume that 
N — M = 2" for some positive integer s (if this is not the 
case, we can augment the size of the string TV until this 
desideratum is fulfilled and assume that no letter occurs 
in the augmented part of the string). 

We start by creating a uniform superposition of the 
s qubits encoding the position of the first symbol of 
the pattern p. This is obtained by simply applying a 
Hadamard gate to each of these qubits, as shown in Fig. 
|2for s ~ 3, and thus it can be achieved in 0{s). The next 
step is to entangle these qubits with the ones encoding 
the position of the second symbol of p, and so on. We 
detail the process to do this for the second symbol of p, 
and the final state is obtained by iterating this process 
M -2 times. 



First we create the state J2^=o^ Ih'i), which can be 
achieved by applying controlled Pauli-X gates, as de- 
picted in the box of Fig. |21 for s = 3. Finally, to ob- 
tain the particular sequence encoding the order of the 
symbols of p, we apply a sequence of 0(log^(A^ — M)) 
multi-controlled Pauli-X gates, as show in Fig. |2| These 
multi-controlled Pauli-X gates can be implemented using 
0(log(7V - M)) C-NOT and Pauli-X gates 0, and thus 
the overall circuit complexity to construct this particular 
entanglement is 0{log^{N ~ A/)). 
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FIG. 2: Core of the circuit to generate the initial state \tpo} 
given by equation Ql. The first/second set of s lines (in this 
example, s — 3) represent the qubits encoding the position of 
the first /second symbol of the pattern. For the third symbol, 
we apply this same circuit excluding the Hadamard opera- 
tions to a new set of s qubits, controlled by the qubits of the 
second symbol, and so on. This procedure must then be iter- 
ated another M — 3 times, yielding an overall complexity of 
0{M log^{N-M)). 

The iteration is such that the s qubits encoding the 
second symbol of p are then used to control the qubits 
encoding the third symbol of p, and so on. Hence, the 
overall circuit complexity to construct the initial state 
given by equation P is 0{M\og^{N - M)). 

We conclude that our algorithm has an efficient com- 
pile time of 0{N\o^{N) x |E|) and a total run time of 
0{M\og^{N) + iV3/2 log^(7V) log(M)). 

In summary, we have presented a quantum algorithm 
for closest pattern matching that not only makes the 
quantum search in (long unsorted) static databases re- 
alistic, but even interesting, as it offers a faster solution 
than what is known classically for this important prob- 
lem. Based on a compile once, run many times approach, 
our algorithm allows for an arbitrary amount of different 
searches on the same string, and offers a query complex- 
ity of 0{^fN) in the most relevant limit where the size 
M of the pattern is much smaller than the size N of the 
database. Only the cost of setting up the initial state 
shows a dependence on M. Furthermore, we gave the 
details of how to obtain the full quantum circuit that 
implements our algorithm, thus offering an oracle-based 
quantum algorithm ready to be implemented. 
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